Automation of Summarization Evaluation Methods and their Application to the Summarization Process
نویسندگان
چکیده
Summarization is the process of creating a more compact textual representation of a document or a collection of documents. In view of the vast increase in electronically available information sources in the last decade, filters such as automatically generated summaries are becoming ever more important to facilitate the efficient acquisition and use of required information. Different methods using natural language processing (NLP) techniques are being used to this end. One of the shallowest approaches is the clustering of available documents and the representation of the resulting clusters by one of the documents; an example of this approach is the Google News website. It is also possible to augment the clustering of documents with a summarization process, which would result in a more balanced representation of the information in the cluster, NewsBlaster being an example. However, while some systems are already available on the web, summarization is still considered a difficult problem in the NLP community. One of the major problems hampering the development of proficient summarization systems is the evaluation of the (true) quality of system-generated summaries. This is exemplified by the fact that the current state-of-the-art evaluation method to assess the information content of summaries, the Pyramid evaluation scheme, is a manual procedure. In this light, this thesis has three main objectives. 1. The development of a fully automated evaluation method. The proposed scheme is rooted in the ideas underlying the Pyramid evaluation scheme and makes use of deep syntactic information and lexical semantics. Its performance improves notably on previous automated evaluation methods. 2. The development of an automatic summarization system which draws on the conceptual idea of the Pyramid evaluation scheme and the techniques developed for the proposed evaluation system. The approach features the algorithm for determining the pyramid and bases importance on the number of occurrences of the variable-sized contributors of the pyramid as opposed to word-based methods exploited elsewhere. 3. The development of a text coherence component that can be used for obtaining the best ordering of the sentences in a summary.
منابع مشابه
A survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملSystematic literature review of fuzzy logic based text summarization
Information Overloadrq is not a new term but with the massive development in technology which enables anytime, anywhere, easy and unlimited access; participation & publishing of information has consequently escalated its impact. Assisting userslq informational searches with reduced reading surfing time by extracting and evaluating accurate, authentic & relevant information are the primary c...
متن کاملText Summarization Using Cuckoo Search Optimization Algorithm
Today, with rapid growth of the World Wide Web and creation of Internet sites and online text resources, text summarization issue is highly attended by various researchers. Extractive-based text summarization is an important summarization method which is included of selecting the top representative sentences from the input document. When, we are facing into large data volume documents, the extr...
متن کاملGraph Hybrid Summarization
One solution to process and analysis of massive graphs is summarization. Generating a high quality summary is the main challenge of graph summarization. In the aims of generating a summary with a better quality for a given attributed graph, both structural and attribute similarities must be considered. There are two measures named density and entropy to evaluate the quality of structural and at...
متن کاملEXTRACTION-BASED TEXT SUMMARIZATION USING FUZZY ANALYSIS
Due to the explosive growth of the world-wide web, automatictext summarization has become an essential tool for web users. In this paperwe present a novel approach for creating text summaries. Using fuzzy logicand word-net, our model extracts the most relevant sentences from an originaldocument. The approach utilizes fuzzy measures and inference on theextracted textual information from the docu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011